21 research outputs found
Learning mixed graphical models with separate sparsity parameters and stability-based model selection
Background: Mixed graphical models (MGMs) are graphical models learned over a combination of continuous and discrete variables. Mixed variable types are common in biomedical datasets. MGMs consist of a parameterized joint probability density, which implies a network structure over these heterogeneous variables. The network structure reveals direct associations between the variables and the joint probability density allows one to ask arbitrary probabilistic questions on the data. This information can be used for feature selection, classification and other important tasks. Results: We studied the properties of MGM learning and applications of MGMs to high-dimensional data (biological and simulated). Our results show that MGMs reliably uncover the underlying graph structure, and when used for classification, their performance is comparable to popular discriminative methods (lasso regression and support vector machines). We also show that imposing separate sparsity penalties for edges connecting different types of variables significantly improves edge recovery performance. To choose these sparsity parameters, we propose a new efficient model selection method, named Stable Edge-specific Penalty Selection (StEPS). StEPS is an expansion of an earlier method, StARS, to mixed variable types. In terms of edge recovery, StEPS selected MGMs outperform those models selected using standard techniques, including AIC, BIC and cross-validation. In addition, we use a heuristic search that is linear in size of the sparsity value search space as opposed to the cubic grid search required by other model selection methods. We applied our method to clinical and mRNA expression data from the Lung Genomics Research Consortium (LGRC) and the learned MGM correctly recovered connections between the diagnosis of obstructive or interstitial lung disease, two diagnostic breathing tests, and cigarette smoking history. Our model also suggested biologically relevant mRNA markers that are linked to these three clinical variables. Conclusions: MGMs are able to accurately recover dependencies between sets of continuous and discrete variables in both simulated and biomedical datasets. Separation of sparsity penalties by edge type is essential for accurate network edge recovery. Furthermore, our stability based method for model selection determines sparsity parameters faster and more accurately (in terms of edge recovery) than other model selection methods. With the ongoing availability of comprehensive clinical and biomedical datasets, MGMs are expected to become a valuable tool for investigating disease mechanisms and answering an array of critical healthcare questions
Recommended from our members
Transcriptional Programming of Normal and Inflamed Human Epidermis at Single-Cell Resolution
© 2018 The Authors Perturbations in the transcriptional programs specifying epidermal differentiation cause diverse skin pathologies ranging from impaired barrier function to inflammatory skin disease. However, the global scope and organization of this complex cellular program remain undefined. Here we report single-cell RNA sequencing profiles of 92,889 human epidermal cells from 9 normal and 3 inflamed skin samples. Transcriptomics-derived keratinocyte subpopulations reflect classic epidermal strata but also sharply compartmentalize epithelial functions such as cell-cell communication, inflammation, and WNT pathway modulation. In keratinocytes, ∼12% of assessed transcript expression varies in coordinate patterns, revealing undescribed gene expression programs governing epidermal homeostasis. We also identify molecular fingerprints of inflammatory skin states, including S100 activation in the interfollicular epidermis of normal scalp, enrichment of a CD1C+CD301A+myeloid dendritic cell population in psoriatic epidermis, and IL1βhiCCL3hiCD14+monocyte-derived macrophages enriched in foreskin. This compendium of RNA profiles provides a critical step toward elucidating epidermal diseases of development, differentiation, and inflammation. Cheng et al. report single-cell RNA sequencing of normal and inflamed human epidermis, revealing a discrete set of specialized keratinocytes that exhibit a distinct composition at different anatomic sites. Myeloid dendritic cells and macrophages also vary sharply with epidermal anatomic site and inflammation, indicating dynamic programming of antigen-presenting cells
Integrated genomic and molecular characterization of cervical cancer
Cervical cancer remains one of the leading causes of cancer-related deaths worldwide. Here we report the extensive molecular characterization of 228 primary cervical cancers, the largest comprehensive genomic study of cervical cancer to date. We observed striking APOBEC mutagenesis patterns and identified SHKBP1, ERBB3, CASP8, HLA-A, and TGFBR2 as novel significantly mutated genes in cervical cancer. We also discovered novel amplifications in immune targets CD274/PD-L1 and PDCD1LG2/PD-L2, and the BCAR4 lncRNA that has been associated with response to lapatinib. HPV integration was observed in all HPV18-related cases and 76% of HPV16-related cases, and was associated with structural aberrations and increased target gene expression. We identified a unique set of endometrial-like cervical cancers, comprised predominantly of HPV-negative tumors with high frequencies of KRAS, ARID1A, and PTEN mutations. Integrative clustering of 178 samples identified Keratin-low Squamous, Keratin-high Squamous, and Adenocarcinoma-rich subgroups. These molecular analyses reveal new potential therapeutic targets for cervical cancers.clos